120 research outputs found
Ultra-tight GPS/IMU Integration based Long-Range Rocket Projectile Navigation
Accurate navigation is important for long-range rocket projectile’s precise striking. For getting a stable and high-performance navigation result, a ultra-tight global position system (GPS), inertial measuring unit integration (IMU)-based navigation approach is proposed. In this study, high-accuracy position information output from IMU in a short time to assist the carrier phase tracking in the GPS receiver, and then fused the output information of IMU and GPS based on federated filter. Meanwhile, introduced the cubature kalman filter as the local filter to replace the unscented kalman filter, and improved it with strong tracking principle, then, improved the federated filter with vector sharing theory. Lastly simulation was carried out based on the real ballistic data, from the estimation error statistic figure. The navigation accuracy of the proposed method is higher than traditional method.
Scene Graph Generation with External Knowledge and Image Reconstruction
Scene graph generation has received growing attention with the advancements
in image understanding tasks such as object detection, attributes and
relationship prediction,~\etc. However, existing datasets are biased in terms
of object and relationship labels, or often come with noisy and missing
annotations, which makes the development of a reliable scene graph prediction
model very challenging. In this paper, we propose a novel scene graph
generation algorithm with external knowledge and image reconstruction loss to
overcome these dataset issues. In particular, we extract commonsense knowledge
from the external knowledge base to refine object and phrase features for
improving generalizability in scene graph generation. To address the bias of
noisy object annotations, we introduce an auxiliary image reconstruction path
to regularize the scene graph generation network. Extensive experiments show
that our framework can generate better scene graphs, achieving the
state-of-the-art performance on two benchmark datasets: Visual Relationship
Detection and Visual Genome datasets.Comment: 10 pages, 5 figures, Accepted in CVPR 201
Unpaired Image Captioning via Scene Graph Alignments
Most of current image captioning models heavily rely on paired image-caption
datasets. However, getting large scale image-caption paired data is
labor-intensive and time-consuming. In this paper, we present a scene
graph-based approach for unpaired image captioning. Our framework comprises an
image scene graph generator, a sentence scene graph generator, a scene graph
encoder, and a sentence decoder. Specifically, we first train the scene graph
encoder and the sentence decoder on the text modality. To align the scene
graphs between images and sentences, we propose an unsupervised feature
alignment method that maps the scene graph features from the image to the
sentence modality. Experimental results show that our proposed model can
generate quite promising results without using any image-caption training
pairs, outperforming existing methods by a wide margin.Comment: Accepted in ICCV 201
Beneficial Effects of Ethyl Pyruvate through Inhibiting High-Mobility Group Box 1 Expression and TLR4/NF-κB Pathway after Traumatic Brain Injury in the Rat
Ethyl pyruvate (EP) has demonstrated neuroprotective effects against acute brain injury through its anti-inflammatory action. The nuclear protein high-mobility group box 1 (HMGB1) can activate inflammatory pathways when released from dying cells. This study was designed to investigate the protective effects of EP against secondary brain injury in rats after Traumatic Brain Injury (TBI). Adult male rats were randomly divided into three groups: (1) Sham + vehicle group, (2) TBI + vehicle group, and (3) TBI + EP group (n = 30 per group). Right parietal cortical contusion was made by using a weight-dropping TBI method. In TBI + EP group, EP was administered intraperitoneally at a dosage of 75 mg/kg at 5 min, 1 and 6 h after TBI. Brain samples were harvested at 24 h after TBI. We found that EP treatment markedly inhibited the expressions of HMGB1 and TLR4, NF-κB DNA binding activity and inflammatory mediators, such as IL-1β, TNF-α and IL-6. Also, EP treatment significantly ameliorated beam walking performance, brain edema, and cortical apoptotic cell death. These results suggest that the protective effects of EP may be mediated by the reduction of HMGB1/TLR4/NF-κB-mediated inflammatory response in the injured rat brain
Neural Point Process for Learning Spatiotemporal Event Dynamics
Learning the dynamics of spatiotemporal events is a fundamental problem.
Neural point processes enhance the expressivity of point process models with
deep neural networks. However, most existing methods only consider temporal
dynamics without spatial modeling. We propose Deep Spatiotemporal Point Process
(\ours{}), a deep dynamics model that integrates spatiotemporal point
processes. Our method is flexible, efficient, and can accurately forecast
irregularly sampled events over space and time. The key construction of our
approach is the nonparametric space-time intensity function, governed by a
latent process. The intensity function enjoys closed form integration for the
density. The latent process captures the uncertainty of the event sequence. We
use amortized variational inference to infer the latent process with deep
networks. Using synthetic datasets, we validate our model can accurately learn
the true intensity function. On real-world benchmark datasets, our model
demonstrates superior performance over state-of-the-art baselines. Our code and
data can be found at the https://github.com/Rose-STL-Lab/DeepSTPP
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
Diffusion-based models have achieved state-of-the-art performance on
text-to-image synthesis tasks. However, one critical limitation of these models
is the low fidelity of generated images with respect to the text description,
such as missing objects, mismatched attributes, and mislocated objects. One key
reason for such inconsistencies is the inaccurate cross-attention to text in
both the spatial dimension, which controls at what pixel region an object
should appear, and the temporal dimension, which controls how different levels
of details are added through the denoising steps. In this paper, we propose a
new text-to-image algorithm that adds explicit control over spatial-temporal
cross-attention in diffusion models. We first utilize a layout predictor to
predict the pixel regions for objects mentioned in the text. We then impose
spatial attention control by combining the attention over the entire text
description and that over the local description of the particular object in the
corresponding pixel region of that object. The temporal attention control is
further added by allowing the combination weights to change at each denoising
step, and the combination weights are optimized to ensure high fidelity between
the image and the text. Experiments show that our method generates images with
higher fidelity compared to diffusion-model-based baselines without fine-tuning
the diffusion model. Our code is publicly available at
https://github.com/UCSB-NLP-Chang/Diffusion-SpaceTime-Attn.Comment: 20 pages, 16 figure
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Vision and text have been fully explored in contemporary video-text
foundational models, while other modalities such as audio and subtitles in
videos have not received sufficient attention. In this paper, we resort to
establish connections between multi-modality video tracks, including Vision,
Audio, and Subtitle, and Text by exploring an automatically generated
large-scale omni-modality video caption dataset called VAST-27M. Specifically,
we first collect 27 million open-domain video clips and separately train a
vision and an audio captioner to generate vision and audio captions. Then, we
employ an off-the-shelf Large Language Model (LLM) to integrate the generated
captions, together with subtitles and instructional prompts into omni-modality
captions. Based on the proposed VAST-27M dataset, we train an omni-modality
video-text foundational model named VAST, which can perceive and process
vision, audio, and subtitle modalities from video, and better support various
tasks including vision-text, audio-text, and multi-modal video-text tasks
(retrieval, captioning and QA). Extensive experiments have been conducted to
demonstrate the effectiveness of our proposed VAST-27M corpus and VAST
foundation model. VAST achieves 22 new state-of-the-art results on various
cross-modality benchmarks. Code, model and dataset will be released at
https://github.com/TXH-mercury/VAST.Comment: 23 pages, 5 figure
- …